Robust Modified Policy Iteration
نویسندگان
چکیده
Robust dynamic programming (robust DP) mitigates the effects of ambiguity in transition probabilities on the solutions of Markov decision problems. We consider the computation of robust DP solutions for discrete-stage, infinite-horizon, discounted problems with finite state and action spaces. We present robust modified policy iteration (RMPI) and demonstrate its convergence. RMPI encompasses both of the previously known algorithms, robust value iteration and robust policy iteration. In addition to proposing exact RMPI, in which the “inner problem” is solved precisely, we propose inexact RMPI, in which the inner problem is solved to within a specified tolerance. We also introduce new stopping criteria based on the span seminorm. Finally, we demonstrate through some numerical studies that RMPI can significantly reduce computation time.
منابع مشابه
A Unified Approach to Algorithms with a Suboptimality Test in Discounted Semi-markov Decision Processes
This paper deals with computational algorithms for obtaining the optimal stationary policy and the minimum cost of a discounted semi-Markov decision process. Van Nunen [23) has proposed a modified policy iteration algorithm with a suboptimality test of MacQueen type, where the modified policy iteration algorithm is policy iteration method with the policy evaluation routine by a finite number of...
متن کاملNon-Stationary Approximate Modified Policy Iteration
We consider the infinite-horizon γ-discounted optimal control problem formalized by Markov Decision Processes. Running any instance of Modified Policy Iteration—a family of algorithms that can interpolate between Value and Policy Iteration—with an error at each iteration is known to lead to stationary policies that are at least 2γ (1−γ)2 -optimal. Variations of Value and Policy Iteration, that ...
متن کاملLearning Robust Options
Robust reinforcement learning aims to produce policies that have strong guarantees even in the face of environments/transition models whose parameters have strong uncertainty. Existing work uses value-based methods and the usual primitive action setting. In this paper, we propose robust methods for learning temporally abstract actions, in the framework of options. We present a Robust Options Po...
متن کاملAccelerating of Modified Policy Iteration in Probabilistic Model Checking
Markov Decision Processes (MDPs) are used to model both non-deterministic and probabilistic systems. Probabilistic model checking is an approach for verifying quantitative properties of probabilistic systems that are modeled by MDPs. Value and Policy Iteration and modified version of them are well-known approaches for computing a wide range of probabilistic properties. This paper tries to impro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- INFORMS Journal on Computing
دوره 25 شماره
صفحات -
تاریخ انتشار 2013